Bounds on the Bias of Signal 
Parameter Estimators 

By JACOB ZIV 

(Manuscript received December 20, 1968) 

Any estimator which is constrained to take values in a finite range is, 
in general, biased. Many times the bias is unknown; furthermore, in some 
cases the bias may become the main contributor to the mean square en or of 
an estimator. This paper derives upper and lower bounds on the bias of a 
finite-range, signal parameter estimator. 

I. INTRODUCTION AND MAIN RESULTS 

1.1 Introduction 

Let the parameter be denoted by a and let a take values in [ — a, a] . 
We refer to 2a as the a priori range (or space) of a. We assume that 
there exists probabilistic mapping from the parameter space to an 
observation space, that is, a probability law that governs the effect of 
a on the observation. 1 This probability law will be referred to as the 
"channel." After observing the "outcome" which is a point in the 
observation space, we estimate the value of a. Let this estimate be 
denoted by d. Clearly, d is a random variable. 

We assume, throughout this paper, that a takes values in [—A, A]. 

Let the bias be defined 

b(a) = E a [d - a] = j (d - a) dp(d \ a) (1) 

where p(d \ a) is the probability distribution function of d given a. 
Assume that we are now told that the true value of the parameter a 
is either a Y or —a, with equal probabilities. Let H ai be the hypothesis 
that a = ai and let H_ ai be the hypothesis that a = — a Y . The minimum 
probability of error is (dropping the subscript 1 from a x ): 

P.{a, -a} = Min [\{Pr {a \ -a} + Pr {-a \ a}]} 
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where Pr{a | —a} is the probability that the decision will be a, given 
that —a is transmitted, and where the minimization is carried over all 
possible decision rules. (A decision rule is a mapping from the obser- 
vation space to the set {—a; a}.) Then we show in Section II that 

l[6(a) - b(-d)] ^ -AP c {-a;a\ + (A - a); a ^ 0; (2a) 

J[6(o) - b(-a)] ^ AP e {-a; a} - (A + a); a ^ 0. (2b) 

By equation (2a) we have 

M H-a) | + I 6(a) |] ^ .IP, {-a; a} - (A - a); a ^ 0. 
Hence, an estimator must have a nonzero bias if 

^>l-P t i-a;a}. (3) 



The bounds of equation (2) are the main result of this paper. 

If we assume that for any a, 6(a) = —b(—a) we have by equation 
(2) that 

b(a) ^ -AP.{-a;a) + (A - a) (4a) 

and 

6(a) ^ APA-a; a) - (A + a). (4b) 

These bounds are sketched in Fig. 1. 

If, in addition, we assume a symmetry around a in the sense that 

f (d- a) dp(d | a) = 0. (5a) 

We show (see Section II) that in this case 

6(a) ^ -aP.{-a;a); -A ^ a ^ -^ (5b) 

(A - a)P.(-a; a) ^ 6(a) ^ 0; -^ ^ a ^ (5c) 

and 

6(a) ^ -aP e {-a;a}; A ^ a ^ j (5d) 

-(A + a)P.(-a; a) ^ 6(a) ^ 0; — ^ a ^ 0. (5e) 

The bias 6(a) is unknown, in general. However, the probability 
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Fig. 1 — Bounds on the bias of an estimator. 

P e {a; —a} is known for many important cases. The bounds derived 
here depend on the channel probability law through P e {—a; a) only, 
and therefore are easy to compute in many cases. 

1.2 Sharpness 

Section III shows that, for one special case, b(a) is given by 

b(a) = -2AP e \-a;a) + (A - a); a ^ 0, (6a) 

b(a) = 2AP e {~a; a] - (A + a); a ^ 0. (6b) 

Section III also shows that, for another special case, b (a) is given by 

b(a) = 2AP e {-a; a\ - (A + a); a ^ 0, (6c) 

b(a) = -2AP.{-a;a) + (A - a); a < 0. (6d) 

The comparison of equation (6) with the bounds of equations (4) and 
(2) indicates the degree of sharpness of these bounds (see Fig. 1). 

1.3 Examples 
Let the received message be a sample function of the random process 



r{t) = s(t - a) + n(t), 



(7a) 
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where a is an unknown parameter constrained to take values in 
(— a] a). The term n(t) is assumed to be white gaussian noise with 
(double sided) spectral density N . 
Let 

[" s 2 (t) dt = E, (7b) 

p(2a) = | j s(t- a)s(t + a) dt, (7c) 

q = ([1 - P (2a)]E/2N )K (7d) 

Hence, in this case, 2 

P t (~a; a) = (27r)- J f exp (-x 2 /2) dx 

Jq (8a) 

/•CO 

^ (27r) _i / exp (-x 2 /2) dx. 

Hence, by equation (3) 

, „ v „, w ,-. ,.,., -^ > 1 - (2x)~ h / exp (-.r 2 /2) dx-. 

(8b) 

Furthermore, if the channel is that of equation (7) and if d is a maxi- 
mum likelihood estimator, then it follows from equation (7) that the 
conditions of equation (5a) are satisfied, since the maximum likeli- 
hood procedure for estimating a is to evaluate 

X(a*) = [ r(t)s(t - a*) dt 

J — 00 

= [ s(t - a)s(t - a*) dt+ [ n(t)s(t - a*) dt 

and to set d to the value of a* (—A ^ a* ^ A) for which X(a*) is maxi- 
mum. Hence, the statistics of X(a^) are the same as those of \(a%) if 
$(a\ 4- a%) = a; also, 6(a) = —6 (—a). Therefore by equations (5) 
and (8a), 

b(fl) ^ -a(27r) _i ( exp (-x 2 /2) dx; -A ^ a ^ -y (9a) 

«'(£/jV„)» * 

b(a) ^ 0; -4 ^ « ^ (9b) 
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6(a) ^ -o(2t)-* f° exp (-x 2 /2) dx; 4 = a ^ A ^ 

6(a) ^0; 0<a^Y (9d) 

II. DERIVATION OF THE BOUNDS 3 



6(a) = J {& - a) dp(d | a) 

- [ (d - a) dp(d | a) + f (a - a) dp(d I a). 

J&>0 Ja£0 



(10) 



Now, 



[ (d - a) dp(d | a) ^ -a Pr {d > | a) (11a) 

«M>0 

[ (d- a) dp(d | a) g (A - a) Pr {d > | a) (lib) 

f (4 - a) rfp(a | a) ^ -(A + a) Pr {d ^ | a} (lie) 

( (d - a) dp(d | a) ^ -a Pr {a ^ | a} . (lid) 

Also, we have that 

Pr {d > | a} = 1 - Pr {d ^ | a\. (12) 

Inserting equations (11) and (12) into equation (10), we have 

6(a) ^ A Pr \d > | a} - (4 + a) (13) 

6(a) ^ -A Pr {d ^ | a) + (A - a). (14) 

Consider the following detection problem. Assume that a and 
— a (a > 0) are used as two signals for equiprobable binary signalling; 
decide on a if d > and decide on —a if d ^ 0. The probability of 
error associated with this detection procedure is given by 

P a = iPr {d > | -a] + iPr {d ^ \ a}; a > 0. (15) 

The error probability P is lower bounded by P e (—a; a) which is 
the probability of error that is associated with the optimal binary de- 
tection scheme for this detection problem. In the same way P a is upper 
bounded by 1 — P e {—a; a). 
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Hence, 

1 - P.{-a\ a) ^ \[Yr \d ^ | -a] 

+ Pr \d ^ | a}] ^ P.(~a; a) I a ^ 0. (16) 

By equations (13), (14), and (16) we get equations (2a) and (2b). 
Now, if 

I (a - a) dp(d | a) = 

•>-A + \a\£a~aSA-\a\ 

then 

6(a) - / (d - a) dp(d \ a) ^ 0; -4|og0. (17a) 
Hence 

6(a) > -a Pr [d > | a]; • -A ^ a ^ -^ (17b) 

6(a) < (A - a)Pr[d> 0\a]; -^ ^ a ^ 0. (17c) 

Also 

/2a-A 
(d - a) dp(d | a) ^ 0; A ^ a ^ 0. (17d) 



Hence 



6(a) < -a Pr [d ^ | a] ; 4 ^ a ^ A ( 17e ) 

6(a) > -(4 + a) Pr [d ^ | a]; ^ a ^ 4" ( 17f ) 

Equation (5) follows from equations (17) and (11). 

III. THE SHARPNESS OF THE BOUNDS 

In order to check the sharpness of the bounds on b(a), let us dis- 
cuss the following example. 

Let d be some estimation of the parameter a. 
Let a be defined as 

a = A if d > 

a = -A if d g 0. 

Now, regard a as an estimation of a. The bias of a is given by 
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6(a) - (A - a) Pr [d > | a] + (- A - a)[l - Pr (d > | a] 

= 2A Pr {a > | a) - (A + a); a ^ (18) 

and also 

6(a) = -2A Pr {d ^ | a] + (A - a); a > 0. (19) 

Compare equations (18) and (19) with equations (13) and (14). 

In the special case where a is a maximum likelihood estimator and 
the channel is the one given by equation (7), we have that 

Pr [d £ | a] = Pr {d > | - a) 

= P.\-a\a); a ^ 0. (20) 

Inserting equation (20) into equations (18) and (19) yields equation 
(6a) and (6b). By making a = -A if a > and a = A if d ^ we get 
equations (6c) and (6d) in a similar way. 

IV. APPLICATIONS 

4.1 Postdetection Integration 

Assume that one makes n independent, equally distributed, estima- 
tions of a: di, d 2 , &3, • • ■ , &u ' ' ' > &n, & n d let 

1 x*-\ 

a = - E &i ; 

a is sometimes called the "postdetection estimation of a". Such an 
estimator appears in many applications: radar range estimation, post- 
detection diversity combiners in communication systems, and so on. 
Now 

e 2 a 4 E\(a - a) 2 | a] = E[(a - 6(a) - a) 2 | a] + 6 2 (a) 

= - <r 2 + &» 
n 

where <r 2 a is the variance of d,- (for any i), given a. Clearly, if the esti- 
mator d is unbiased, the mean square error that is associated with a 
can be made arbitrarily small by making n large enough. However, if d 
is biased, then, for any n, the mean square error is lower bounded by 

el ^ b 2 (a) ^ 61(a) (21) 

where b L {a) is the lower bound on | 6(a) | given by equations (2), (4), 
or (5). 
Example: Let the channel be given by equation (7) and let d be 
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a maximum likelihood estimator; then by equations (9a), (9c), and 
(21) 



}za 2 ±-\ f exp(-x'/2)dx~\; 



Assume that the a priori range of a, is smaller than (—A, A). 
Then 



>f\r exp(-x*/2)dx\; 

^7T \_J(,E/N,)* J 



<:„ ^ , I ; exp [- -x id ax i ; — ^ I a I ^ a. 



2 
Now, let 

.2 A 2 

e = max e a . 

Then, unless A ^ 2a (that is, unless the range of d is at least twice as 
large as the a priori range of a), we have that 



e 2 >^ 
- 2tt 



/ exp (-x 2 /2) dx 



even if n —> oo . 



4.2 Predetection Integration 

Let the channel be the one given by equation (7). Assume that 
the estimation is based on n repeated measurements; namely, the re- 
ceived signal is given by 

n-l 

r(t) = n(t) + 2 «(* _ a - i2A )- 

■ =0 

In this case, an estimation is being made only after observing the com- 
plete received signal ("predetection integration") . It then follows that 
for a maximum likelihood estimation of a 



6» ^a 2 ^\ f exp (-x 2 /2) 



dx 



JS\*\£A 



which is the same as for single measurement except for E being re- 
placed by nE. In this case, unlike the previous case, the lower bound 
vanishes as n tends to infinity. 
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